Form Design for High Accuracy Optical Character Recognition
نویسندگان
چکیده
Form Design for High Accuracy Optical Character Recognition Michael D. Garris, [email protected] Darrin L. Dimmick, [email protected] National Institute of Standards and Technology, Building 225, Room A216 Gaithersburg, Maryland 20899 Phone: (301)975-2928, FAX: (301)840-1357 Published in IEEE Transactions PAMI, June 1996. ABSTRACT Financial institutions, insurance companies, and government agencies are all aggressively pursuing the integration of automated forms processing into their everyday work flows. To use existing optical character recognition (OCR) technology, the forms that are currently hand-keyed will probably need to be redesigned. This paper presents some of the quantitative results generated by a comprehensive study of three versions of a redesigned tax form. Analyses show that using separately spaced bounding character boxes to represent fields provides superior machine readability over fields without character boxes, fields containing vertical ticks (combs), and fields with adjoining character boxes. It is also shown that character boxes containing two vertically stacked ovals cause writers much more difficulty to complete than do empty character boxes. The analyses also provide quantitative proof that writer idiosyncratic responses on forms are the major source of errors within the recognition system. These idiosyncracies (such as writers crossing out previously printed characters or writing over them) must be effectively handled in order improve recognition performance. This paper demonstrates how form design can help, and it provides empirical data to support some of the rules-of-thumb by measuring the impact specific changes to a form have on machine readability and on the writer.
منابع مشابه
High accuracy handwritten Chinese character recognition using LDA-based compound distances
Article history: Received 9 December 2007 Received in revised form 11 April 2008 Accepted 15 April 2008
متن کاملA Robust Free Size OCR for Omni-Font Persian/Arabic Printed Document Using Combined MLP/SVM
Optical character recognition of cursive scripts present a number of challenging problems in both segmentation and recognition processes and this attracts many researches in the field of machine learning. This paper presents a novel approach based on a combination of MLP and SVM to design a trainable OCR for Persian/Arabic cursive documents. The implementation results on a comprehensive databas...
متن کاملImportant New Developments in Arabographic Optical Character Recognition (OCR)
Leipzig University’s (LU) Alexander von Humboldt Chair for Digital Humanities—has achieved Optical Character Recognition (OCR) accuracy rates for classical Arabic-script texts in the high nineties. These numbers are based on our tests of seven different Arabic-script texts of varying quality and typefaces, totaling over 7,000 lines (~400 pages, 87,000 words; see Table 1 for full details). The...
متن کاملNeural Network Based Recognition System Integrating Feature Extraction and Classification for English Handwritten
Handwriting recognition has been one of the active and challenging research areas in the field of image processing and pattern recognition. It has numerous applications that includes, reading aid for blind, bank cheques and conversion of any hand written document into structural text form. Neural Network (NN) with its inherent learning ability offers promising solutions for handwritten characte...
متن کاملDesign of an Optical Character Recognition System for Camera-based Handheld Devices
This paper presents a complete Optical Character Recognition (OCR) system for camera captured image/graphics embedded textual documents for handheld devices. At first, text regions are extracted and skew corrected. Then, these regions are binarized and segmented into lines and characters. Characters are passed into the recognition module. Experimenting with a set of 100 business card images, ca...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Pattern Anal. Mach. Intell.
دوره 18 شماره
صفحات -
تاریخ انتشار 1996